ipayipi

opening the pipeline:
imbibe data

13 June, 2025

## Loading required package: cli
## Loading required package: data.table
## data.table 1.17.4 using 6 threads (see ?getDTthreads).  Latest news: r-datatable.com
## Loading required package: future.apply
## Loading required package: future
## 
## • Always check fuction help files ;)
## • Use options `verbose` and `xtra_v` for messages on setting up and processing
## Loading required package: gridExtra

Summary

‘ipayipi’ was built for processing time-series data using ‘data.table’ functionality and speed for processing large datasets. This vignette introduces batch processing of time-series data using R ipayipi: covering extraction and standardisation of logger data. Specifically, the five steps outlined in this vignette are:

  1. Load R ipayipi (install from GitHub),
  2. Initiate pipeline,
  3. Imbibe data,
  4. Standardise data,
  5. Make station files, &
  6. Identify gaps.

An overview of the code to execute these five steps are shown below. This code is all that is required to run each time new data is available. For more details and options for each step, see respective function documentation/help files.

## 1. load pacakge
library(ipayipi)

## 2. initiate pipeline ----
# define our general working directory: wherever data is going to be processed ...
pd <- "ipayipi/data-raw/eg_data/met_eg"

# setup pipeline directory structure
pipe_house <- ipip_house(pipe_house_dir = pd)

## 3. read & imbibe in data ----
logger_data_import_batch(pipe_house)
imbibe_raw_batch(pipe_house, file_ext_in = ".dat",
  data_setup = ipayipi::cs_toa5
)

## 4. standardise data ----
header_sts(pipe_house)
phenomena_sts(pipe_house)

## store standardised data ----
# transfer standardised data to the d3_nomvet_room
transfer_sts_files(pipe_house)

## 5. append standardised data files ----
append_station_batch(pipe_house)

## 6. identify gaps ----
gap_eval_batch(pipe_house)

Introduction

Organising time-series data from loggers, such as, weather stations or ground-water sensors, requires no small amount of data structuring. ipayipi helps make this process dynamic and structured, so that, processing, down (and back up) a pipeline, is traceable. Importantly, ipayipi preserves raw-data integrity and initiates archival—so raw data is available in a standardised format—and the processing of this data can be updated.

The package can be found on GitHub and can be installed using the ‘devtools’ package: devtools::install_github("SAEONData/ipayipi").

Initiate pipeline: the ‘pipe_house’

The ‘pipe_house’ is the name of the directory where we will initiate a data-pipeline structure. To keep things simple, we will only use our ‘pipe_house’ for set data streams or data families. In this vignette, the data stream consists of meteorological data gathered from a weather-station data logger. ipayipi handles most time-series data formats readable from flat files into R. Data for this vignette can be downloaded from GitHub in the package’s raw-data folder here.

# setting up the 'pipe_house'
## general pipeline working directory
pd <- "ipayipi/data-raw/eg_data/met_eg"

## initiate pipeline
pipe_house <- ipip_house(pipe_house_dir = pd)
print(pipe_house)
## $r
##                                   r 
## "ipayipi/data-raw/eg_data/met_eg/r" 
## 
## $d1_source_room
##                                   d1_source_room 
## "ipayipi/data-raw/eg_data/met_eg/d1_source_room" 
## 
## $d2_wait_room
##                                   d2_wait_room 
## "ipayipi/data-raw/eg_data/met_eg/d2_wait_room" 
## 
## $d3_nomvet_room
##                                   d3_nomvet_room 
## "ipayipi/data-raw/eg_data/met_eg/d3_nomvet_room" 
## 
## $d4_ipip_room
##                                   d4_ipip_room 
## "ipayipi/data-raw/eg_data/met_eg/d4_ipip_room" 
## 
## $d5_dta_out
##                                   d5_dta_out 
## "ipayipi/data-raw/eg_data/met_eg/d5_dta_out" 
## 
## $reports
##                                   reports 
## "ipayipi/data-raw/eg_data/met_eg/reports" 
## 
## $d0_raw_room
##                                   d0_raw_room 
## "ipayipi/data-raw/eg_data/met_eg/d0_raw_room" 
## 
## $pipe_house_dir
## [1] "ipayipi/data-raw/eg_data/met_eg"

What has ipip_house() done? It has created the following directories, if they don’t already exist*:

  1. ‘d1_source_room’: where new logger data is going to be made available.
  2. ‘d2_wait_room’: waiting room for imbibing data into the pipeline.
  3. ‘d3_nomvet_room’: where standardised/corrected logger files get archived (nomenclature vetted).
  4. ‘d4_ipip_room’: here data get appended into contiguous single station records, and processed.
  5. ‘d0_raw_room’: where ‘unaltered’ raw data gets pushed.

*NB! Running this function will not overwrite existing data.

Imbibe data

In this step, data gets pulled from pipelines data source, that is, the ‘source directory’ (pipe_house$d1_source_room), into the ‘waiting room’ (pipe_house$d2_wait_room). The example data contains two-years of Cambell Scientific logger text files derived from sensors on a SAEON meteorological station in northern Maputaland, South Africa.

# copy data from source to the d2_wait_room
logger_data_import_batch(pipe_house = pipe_house,
  file_ext = ".dat", # the file extension (with period) of raw data files
  verbose = FALSE, # set to TRUE to report progress in the terminal
  unwanted = "02.2022" # excluding the import of Feb 2022 to make data 'gap'
)

Now that some data is in the ‘d2_wait_room’ directory we can read it into R. Note the pre-set ‘data_setup’ option for Cambell Scientific TOA5 formatted files ipayipi::cs_toa5.

imbibe_raw_batch(pipe_house = pipe_house,
  data_setup = ipayipi::cs_toa5, # standard for reading t0a5 formatted files 
  record_interval_type = "continuous"
)

For more on data input formats, that is, the ‘data_setup’ argument, see the help files of the imbibe_raw_logger_dt() function (i.e., ?imbibe_raw_logger_dt).

Record-interval type is an important parameter. ipayipi handles continuous, event-based (discontinuous), and mixed time-series data types. Record intervals get evaluated using the record_interval_eval() function. Record interval information will be important for further steps, such as, identifying ‘gaps’ or missing data automatically.

Standardise data

Both file-header information, plus other phenomena (variable) metadata, will now be standardised. The spelling/synonyms of file names and associated header metadata have to be scrutinised first. Only after header information gets standardised, can we move on to working on the phenomena. These steps are essential for automating file record appending and downstream data correction/processing (e.g., drift correction).

header_sts(pipe_house)

If it is the first-time running header_sts(), or new synonyms get introduced into pipe-house directory, header_sts() will produce a warning. This is because the user needs to define new nomenclature standards. Unstandardised names (or columns) have the preffix ‘uz. These standards get stored in a file called ‘nomtab.rns’ in the ‘waiting room’. If this file is deleted—a new one will be generated—but the user will have to populate the tables with synonym vocab.

The nomenclature table in the ‘waiting room’ can be updated from ‘csv’ format (or directly in R). If a new synonym gets introduced—the file containing new nomenclature will be skipped in further processing—a ‘csv’ version of the ‘nomtab.rns’ will be copied to the ‘waiting room’ for editing.

Only the following fields — with NAs — require editing in the ‘nomtab’ ‘csv’:

pt <- read.csv("ipayipi/data-raw/eg_data/met_eg/nomtab_display.csv")
kbl(pt) |>
  kable_paper("hover") |>
  kable_styling(font_size = 11) |>
  column_spec(c(1, 7:8, 11), background = "#df9a86") |>
  column_spec(c(2:6, 9:10, 12), background = "#aed8f0") |>
  scroll_box(width = "100%", height = "400px")
uz_station location station stnd_title logger_type logger_title uz_record_interval_type uz_record_interval record_interval_type record_interval uz_table_name table_name
Mabasa AWS mcp mabasa_aws mcp_mabasa_aws CR1000X CR1000 mixed 1_days mixed 1_days TableDay raw_1_days
Mabasa AWS mcp mabasa_aws mcp_mabasa_aws CR1000X CR1000 mixed 1_hours mixed 1_hours TableHour raw_1_hours
Mabasa AWS mcp mabasa_aws mcp_mabasa_aws CR1000X CR1000X continuous 1_days continuous 1_days TableDay raw_1_days
Mabasa AWS mcp mabasa_aws mcp_mabasa_aws CR1000X CR1000X continuous 1_hours continuous 1_hours TableHour raw_1_hours
Station2 mcp mabasa_aws mcp_mabasa_aws CR1000X CR1000X continuous 1_days continuous 1_days TableDay raw_1_days
Station2 mcp mabasa_aws mcp_mabasa_aws CR1000X CR1000X continuous 1_hours continuous 1_hours TableHour raw_1_hours
Sibayi AWS mcp sibayi_aws mcp_sibayi_aws CR200X CR200X continuous 1_days continuous 1_days Daily raw_1_days
Sibayi AWS mcp sibayi_aws mcp_sibayi_aws CR200X CR200X continuous 5_mins continuous 5_mins Five_Minutes raw_5_mins
Sibayi AWS mcp sibayi_aws mcp_sibayi_aws CR200X CR200X continuous 1_hours continuous 1_hours Hourly_ raw_1_hours
84923 mcp vasi_science_centre_aws mcp_vasi_science_centre_aws CR1000 CR1000 continuous 1_days continuous 1_days Daily raw_1_days
84923 mcp vasi_science_centre_aws mcp_vasi_science_centre_aws CR1000 CR1000 continuous 5_mins continuous 5_mins Five_mins raw_5_mins
84923 mcp vasi_science_centre_aws mcp_vasi_science_centre_aws CR1000 CR1000 continuous 1_hours continuous 1_hours Hourly raw_1_hours
88296 mcp vasi_science_centre_aws mcp_vasi_science_centre_aws CR1000 CR1000 continuous 1_days continuous 1_days Daily raw_1_days
88296 mcp vasi_science_centre_aws mcp_vasi_science_centre_aws CR1000 CR1000 continuous 5_mins continuous 5_mins Five_mins raw_5_mins
88296 mcp vasi_science_centre_aws mcp_vasi_science_centre_aws CR1000 CR1000 continuous 1_hours continuous 1_hours Hourly raw_1_hours
CR1000 - Vasi_Science Centre AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws CR1000 CR1000 continuous 1_days continuous 1_days Daily raw_1_days
CR1000 - Vasi_Science Centre AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws CR1000 CR1000 continuous 5_mins continuous 5_mins Five_mins raw_5_mins
CR1000 - Vasi_Science Centre AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws CR1000 CR1000 continuous 1_hours continuous 1_hours Hourly raw_1_hours
CR1000_Vasi Science Centre AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws CR1000 CR1000 continuous 1_days continuous 1_days Daily raw_1_days
CR1000_Vasi Science Centre AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws CR1000 CR1000 continuous discnt continuous discnt Daily raw_1_days
CR1000_Vasi Science Centre AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws CR1000 CR1000 continuous 5_mins continuous 5_mins Five_mins raw_5_mins
CR1000_Vasi Science Centre AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws CR1000 CR1000 continuous 1_hours continuous 1_hours Hourly raw_1_hours
Science Center AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws CR1000 CR1000 continuous 1_days continuous 1_days Daily raw_1_days
Science Center AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws CR1000 CR1000 continuous 5_mins continuous 5_mins Five_mins raw_5_mins
Science Center AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws CR1000 CR1000 continuous 1_hours continuous 1_hours Hourly raw_1_hours
Science Centre AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws CR1000 CR1000 continuous 1_days continuous 1_days Daily raw_1_days
Science Centre AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws CR1000 CR1000 continuous 5_mins continuous 5_mins Five_mins raw_5_mins
Science Centre AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws CR1000 CR1000 mixed 5_mins mixed 5_mins Five_mins raw_5_mins
Science Centre AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws CR1000 CR1000 continuous 1_hours continuous 1_hours Hourly raw_1_hours
CR200Series mcp sibayi_aws mcp_sibayi_aws CR200X NA continuous 1_days continuous 1_days Daily raw_1_days
CR200Series mcp sibayi_aws mcp_sibayi_aws CR200X NA continuous 5_mins continuous 5_mins Five_Minutes raw_5_mins
CR200Series mcp sibayi_aws mcp_sibayi_aws CR200X NA continuous 1_hours continuous 1_hours Hourly_ raw_1_hours
Sibayi mcp sibayi_aws mcp_sibayi_aws CR300 NA continuous 1_hours continuous 1_hours Table1 raw_1_hours
Sibayi mcp sibayi_aws mcp_sibayi_aws CR300 NA continuous 1_days continuous 1_days Table2 raw_1_days
Sibayi mcp sibayi_aws mcp_sibayi_aws CR300 NA continuous 5_mins continuous 5_mins Table3 raw_5_mins
Sibayi AWS mcp sibayi_aws mcp_sibayi_aws CR200X NA continuous discnt continuous discnt Daily raw_1_days
Sibayi Camp Mini AWS mcp sibayi_aws mcp_sibayi_aws CR200X NA mixed 1_days mixed 1_days Daily raw_1_days
Sibayi Camp Mini AWS mcp sibayi_aws mcp_sibayi_aws CR200X NA continuous 1_days continuous 1_days Daily raw_1_days
Sibayi Camp Mini AWS mcp sibayi_aws mcp_sibayi_aws CR200X NA continuous 5_mins continuous 5_mins Five_Minutes raw_5_mins
Sibayi Camp Mini AWS mcp sibayi_aws mcp_sibayi_aws CR200X NA continuous 1_hours continuous 1_hours Hourly_ raw_1_hours
Sibayi Mini AWS mcp sibayi_aws mcp_sibayi_aws CR200X NA continuous 1_days continuous 1_days Daily raw_1_days
Sibayi Mini AWS mcp sibayi_aws mcp_sibayi_aws CR200X NA continuous 5_mins continuous 5_mins Five_Minutes raw_5_mins
Sibayi Mini AWS mcp sibayi_aws mcp_sibayi_aws CR200X NA continuous 1_hours continuous 1_hours Hourly_ raw_1_hours
Sibayi Mini Station mcp sibayi_aws mcp_sibayi_aws CR200X NA continuous 1_days continuous 1_days Daily raw_1_days
Sibayi Mini Station mcp sibayi_aws mcp_sibayi_aws CR200X NA continuous 5_mins continuous 5_mins Five_Minutes raw_5_mins
Sibayi Mini Station mcp sibayi_aws mcp_sibayi_aws CR200X NA continuous 1_hours continuous 1_hours Hourly_ raw_1_hours

Once NA values of the above fields have been populated the edited ‘csv’ will be imbibed into the pipeline structure when rerunning header_sts(pipe_house)—this function will imbibe the most recently updated ‘csv’ nomenclature table from the ‘d2_wait_room’ into the pipeline, and standardised header nomenclature.

In step with good tidy data standards, keep nomenclature to ‘snake case’ with no special characters (bar the useful underscore’). ’ Standardising phenomena metadata follows a similar process as for header-data standardisation. If the phenomena standards have been described and there is a ‘phentab.rps’ in the ‘waiting room’, running the below code updates all files phenomena details.

phenomena_sts(pipe_house = pipe_house)

If there is no ‘phenomena table’ (‘phentab.rps’), one NA values in the ‘csv’ copy need to be described. The following fields in the ‘csv’ phentab must be populated:

Additional fields that are not mandatory include:

If an ‘f_convert’ factor (scroll right on the table below) is applied to phenomena, the standardised units must be different from the unstandardised units (uz_units) in the phenomena table. This ensures that phenomena that are appended have similar units.

phen_name_full phen_type phen_name units measure offset var_type uz_phen_name uz_units uz_measure f_convert sensor_id notes
Battery level: maximum voltage Battery level batt_max_v v max 0 num BattV_Max Volts Max NA NA NA
Battery level: maximum voltage Battery level batt_max_v v max 0 num LoggerBatt_Max Volts Max NA NA NA
Battery level: time of maximum Battery level batt_max_v_time time smp 0 posix BattV_TMx Volts TMx NA NA NA
Battery level: minimum voltage Battery level batt_min_v v min 0 num BattV_Min Volts Min NA NA NA
Battery level: time of minimum Battery level batt_min_v_time time smp 0 posix BattV_TMn Volts TMn NA NA NA
Data logger: serial number Data logger: serial number data_logger_sn smp smp 0 chr DataloggerSerialNumber Smp NA NA NA
Humidity, relative: maximum Humidity, relative humid_rel_max_pcnt pcnt max 0 num RH_Max % Max NA NA NA
Humidity, relative: time of maximum Humidity, relative humid_rel_max_time time smp 0 posix RH_TMx TMx NA NA NA
Humidity, relative: minimum Humidity, relative humid_rel_min pcnt min 0 num RH_Min % Min NA NA NA
Humidity, relative: percent Humidity, relative humid_rel_pcnt pcnt smp 0 num RH % Smp NA NA NA
Leaf wetness: total time con Leaf wetness: time con leaf_con_tot_time mins tot 0 num LWMCon_Tot Minutes Tot NA NA NA
Leaf wetness: total time dry Leaf wetness: time dry leaf_dry_tot_time mins tot 0 num LWMDry_Tot Minutes Tot NA NA NA
Leaf wetness: average Leaf wetness: conductivity leaf_wet_avg mv avg 0 num LWmV_Avg mV Avg NA NA NA
Leaf wetness: maximum Leaf wetness leaf_wet_max mv max 0 num LWmV_Max mV Max NA NA NA
Leaf wetness: time of maximum Leaf wetness leaf_wet_max_time time smp 0 posix LWmV_TMx TMx NA NA NA
Leaf wetness: minimum Leaf wetness leaf_wet_min mv min 0 num LWmV_Min mV Min NA NA NA
Leaf wetness: time of minimum Leaf wetness leaf_wet_min_time time smp 0 posix LWmV_TMn TMn NA NA NA
Leaf wetness: total time wet Leaf wetness: time wet leaf_wet_tot_time mins tot 0 num LWMWet_Tot Minutes Tot NA NA NA
Moisture, soil: average percent Moisture, soil moisture_soil_pcnt pcnt avg 0 num VW_Avg Avg NA NA NA
Pressure, atmosphere: sample Pressure, atmosphere pressure_atm kpa smp 0 num BP_kPa kPa Smp NA NA NA
Pressure, atmosphere: average Pressure, atmosphere pressure_atm_avg kpa avg 0 num BPressure_Avg kPa Avg NA NA NA
Pressure, atmosphere: average Pressure, atmosphere pressure_atm_avg kpa avg 0 num BPressure_Avg mbar Avg 0.1 NA Converted to kilopascals by multiplying by 0.1.
Pressure, atmosphere: time of maximum Pressure, atmosphere pressure_atm_max_time time smp 0 posix BPressure_TMx TMx NA NA NA
Pressure, atmosphere: minimum Pressure, atmosphere pressure_atm_min kpa min 0 num BPressure_Min kPa Min NA NA NA
Pressure, atmosphere: time of minimum Pressure, atmosphere pressure_atm_min_time time smp 0 posix BPressure_TMn TMn NA NA NA
Rainfall: total Rainfall rain_tot mm tot 0 num Rain_Tot mm Tot NA NA NA
Rainfall: total Rainfall rain_tot mm tot 0 num Rain_mm_Tot mm Tot NA NA NA
Solar radiation: average Solar radiation solar_rad_avg mj_per_m2 avg 0 num SlrW_Avg W/m^2 Avg NA NA NA
Solar radiation: maximum Solar radiation solar_rad_max mj_per_m2 max 0 num SlrW_Max W/m^2 Max NA NA NA
Solar radiation: time of maximum Solar radiation solar_rad_max_time time smp 0 posix SlrW_TMx TMx NA NA NA
Solar radiation: standard deviation Solar radiation solar_rad_sd w_per_m2 sd 0 num SlrW_Std W/m^2 Std NA NA NA
Solar radiation: total Solar radiation solar_rad_tot mj_per_m2 tot 0 num SlrMJ_Tot MJ/m^2 Tot NA NA NA
Temperature, air: average Temperature, air temp_air_avg deg_c avg 0 num AirTC_Avg Deg C Avg NA NA NA
Temperature, air: maximum Temperature, air temp_air_max deg_c max 0 num AirTC_Max Deg C Max NA NA NA
Temperature, air: time of maximum Temperature, air temp_air_max_time time smp 0 posix AirTC_TMx Deg C TMx NA NA NA
Temperature, air: minimum Temperature, air temp_air_min deg_c min 0 num AirTC_Min Deg C Min NA NA NA
Temperature, air: time of minimum Temperature, air temp_air_min_time time smp 0 posix AirTC_TMn Deg C TMn NA NA NA
Temperature, ground level: average Temperature, ground level temp_ground_avg deg_c avg 0 num T107_C_Avg Deg C Avg NA NA NA
Temperature, ground level: minimum Temperature, ground level temp_ground_min deg_c min 0 num T107_C_Min Deg C Min NA NA NA
Temperature, logger: average Temperature, logger temp_logg_avg deg_c avg 0 num LoggerTemp_Avg DegC Avg NA NA NA
Temperature, logger: maximum Temperature, logger temp_logg_max deg_c max 0 num LoggerTemp_Max DegC Max NA NA NA
Temperature, logger: time of maximum Temperature, logger temp_logg_max_time time smp 0 posix LoggerTemp_TMx TMx NA NA NA
Temperature, logger: minimum Temperature, logger temp_logg_min deg_c min 0 num LoggerTemp_Min DegC Min NA NA NA
Temperature, logger: time of minimum Temperature, logger temp_logg_min_time time smp 0 posix LoggerTemp_TMn TMn NA NA NA
Temperature, soil: average Temperature, soil temp_soil_avg deg_c avg 0 num SoilTemp_Avg Deg C Avg NA NA NA
Temperature, soil: maximum Temperature, soil temp_soil_max deg_c max 0 num SoilTemp_Max Deg C Max NA NA NA
Temperature, soil: time of maximum Temperature, soil temp_soil_max_time time smp 0 posix SoilTemp_TMx TMx NA NA NA
Temperature, soil: minimum Temperature, soil temp_soil_min deg_c min 0 num SoilTemp_Min Deg C Min NA NA NA
Temperature, soil: time of minimum Temperature, soil temp_soil_min_time time smp 0 posix SoilTemp_TMn TMn NA NA NA
UV radiation: average UV radiation uv_rad_avg mj_per_m2 avg 0 num CUV5_W_Avg W/m^2 Avg NA NA NA
UV radiation: average UV radiation uv_rad_avg w_per_m2 avg 0 num UV_W_Avg W/m^2 Avg NA NA NA
UV radiation: maximum UV radiation uv_rad_max mj_per_m2 max 0 num CUV5_W_Max W/m^2 Max NA NA NA
UV radiation: time of maximum UV radiation uv_rad_max_time time smp 0 posix UV_W_TMx TMx NA NA NA
UV radiation: standard deviation UV radiation uv_rad_sd mj_per_m2 sd 0 num CUV5_W_Std W/m^2 Std NA NA NA
UV radiation: sensitivity UV radiation, sensor uv_rad_sensi desc smp 0 num UV_Sensitivity Smp NA NA NA
UV radiation: sensor sn UV radiation, sensor uv_rad_sn desc smp 0 int UV_SN Smp NA NA NA
UV radiation: total UV radiation uv_rad_tot mj_per_m2 tot 0 num CUV5_MJ_Tot MJ/m^2 Tot NA NA NA
Wind direction: instantaneous Wind direction wind_dir deg smp 0 num WindDir_D1_WVT Deg WVc NA NA NA
Wind direction: standard deviation Wind direction wind_dir_sd deg sd 0 num WindDir_SD1_WVT Deg WVc NA NA NA
Wind speed: sample Wind speed wind_speed m_per_sec smp 0 num WS_ms meters/second Smp NA NA NA
Wind speed: average Wind speed wind_speed_avg m_per_sec avg 0 num WS_ms_Avg meters/second Avg NA NA NA
Wind speed: maximum Wind speed wind_speed_max m_per_sec max 0 num WS_ms_Max meters/sec Max NA NA NA
Wind speed: time of maximum Wind speed wind_speed_max_time time smp 0 posix WS_ms_TMx meters/second TMx NA NA NA
Wind speed: minimum Wind speed wind_speed_min m_per_sec min 0 num WS_ms_Min meters/second Min NA NA NA
Wind speed: minimum Wind speed wind_speed_min m_per_sec min 0 num WSpeed_Min meters/second Min NA NA NA
Wind speed: time of minimum Wind speed wind_speed_min_time time smp 0 posix WS_ms_TMn meters/second TMn NA NA NA

After filling in details, to replace NA values, rerun phenomena_sts(pipe_house), to imbibe the updated phenomena descriptions, and update the logger data being standardised.

Standardised data files get transferred to the ‘nomenclature vetted’ directory (‘nomtab room’) using the function below. After being transferred, files in the waiting room (except the nomtab and phentab standards) are automatically removed.

# move standardised files to a storage directory
transfer_sts_files(pipe_house)

Archiving raw data files: Before removing raw unstandardised files—if there is a ‘d0_raw_room’ directory in the pipeline working directory—raw input data files will be copied to this directory and filed in folders by year and month of the lasted date of recording. This is done by the imbibe_raw_batch() function.

Make station files

The append_station_batch() function updates station files in the ‘d4_ipip_room’ with files from the ‘d3_nomvet_room’.

# append station files + metadata records
# note the 'cores' argument --- parallel processing supported on Linux systems
append_station_batch(pipe_house)
## [1] "/tmp/Rtmp4GQshS/sf/mcp_vasi_science_centre_aws.ipip/data_summary"

Now that a station file has been generated for the Vasi Science Centre weather station we can check what tables have been created/appended. Station files are maintained in the ‘d4_ipip_room’ of the pipeline’s folder structure.

# list station files in the ipip directory
sf <- dta_list(
  input_dir = pipe_house$d4_ipip_room, # search directory
  file_ext = ".ipip", # note the station's default file extension
)

# check what stations are in the ipip room
print(sf)
## [1] "mcp_vasi_science_centre_aws.ipip"
# read in the station file
sf <- readRDS(file.path(pipe_house$d4_ipip_room, sf[1]))

# names of the tables stored in the station file
names(sf)
## [1] "data_summary"      "gaps"              "logg_interfere"    "phen_data_summary" "phens"             "raw_1_days"        "raw_1_hours"       "raw_5_mins"

‘Raw’ data

Our station file has three ‘raw’ data tables, with 5 minute, daily, and monthly data.

# a look 1st data row of 5 minute data
print(sf$raw_5_min[1, ])
##        id           date_time humid_rel_pcnt rain_tot solar_rad_avg temp_air_avg temp_ground_avg uv_rad_avg wind_dir wind_dir_sd wind_speed
##     <num>              <POSc>          <num>    <num>         <num>        <num>           <num>      <num>    <num>       <num>      <num>
## 1: 106550 2021-06-16 11:50:00          47.08        0         634.6        26.21           28.25      22.56    1.573       38.82      2.737
# use kableExtra to view first 20 data rows --- default printing of tables doesn't
#  looks good in html
kbl(sf$raw_5_min[1:20, ]) |>
  kable_paper("hover") |>
  kable_styling(font_size = 11) |>
  scroll_box(width = "100%", height = "400px")
id date_time humid_rel_pcnt rain_tot solar_rad_avg temp_air_avg temp_ground_avg uv_rad_avg wind_dir wind_dir_sd wind_speed
106550 2021-06-16 11:50:00 47.08 0 634.6 26.21 28.25 22.56 1.573 38.82 2.737
106551 2021-06-16 11:55:00 44.15 0 622.0 26.33 28.47 22.75 336.700 33.84 3.340
106552 2021-06-16 12:00:00 43.78 0 601.5 26.27 28.30 22.67 353.200 28.91 2.962
106553 2021-06-16 12:05:00 44.90 0 599.2 26.28 28.31 22.79 350.100 34.29 2.912
106554 2021-06-16 12:10:00 43.76 0 599.5 26.24 28.31 22.67 337.900 30.72 3.653
106555 2021-06-16 12:15:00 43.23 0 598.9 26.25 28.31 24.64 354.300 40.26 3.408
106556 2021-06-16 12:20:00 42.78 0 590.2 26.24 28.31 24.20 353.300 29.10 3.345
106557 2021-06-16 12:25:00 43.34 0 590.0 26.13 28.17 24.35 1.191 29.01 3.733
106558 2021-06-16 12:30:00 42.84 0 583.1 26.39 28.46 24.19 351.900 21.85 3.298
106559 2021-06-16 12:35:00 43.16 0 571.4 26.41 28.49 23.96 350.400 25.53 3.115
106560 2021-06-16 12:40:00 43.34 0 480.5 26.16 28.00 21.85 3.688 20.48 3.702
106561 2021-06-16 12:45:00 42.70 0 561.2 26.20 27.96 23.53 1.266 32.19 2.842
106562 2021-06-16 12:50:00 41.89 0 600.7 26.58 28.39 24.08 351.600 34.39 3.072
106563 2021-06-16 12:55:00 40.63 0 591.7 26.59 28.57 23.20 0.684 41.28 3.232
106564 2021-06-16 13:00:00 41.71 0 503.1 26.57 28.55 20.96 354.800 39.68 2.718
106565 2021-06-16 13:05:00 41.66 0 612.0 26.48 28.54 23.77 355.100 30.45 3.355
106566 2021-06-16 13:10:00 41.51 0 629.2 26.70 28.76 23.90 1.283 29.65 3.183
106567 2021-06-16 13:15:00 40.52 0 582.0 26.46 28.53 22.86 5.513 34.68 3.655
106568 2021-06-16 13:20:00 40.89 0 540.9 26.69 28.74 21.51 355.500 29.85 2.590
106569 2021-06-16 13:25:00 41.60 0 530.9 26.76 28.86 21.23 0.879 38.26 3.357

The ‘raw’ data-header summary

This table contains summary information on the origin of each data file used to make up the station file.

# using kableExtra
kbl(sf$data_summary) |> kable_paper("hover") |> kable_styling(font_size = 11) |>
  scroll_box(width = "100%", height = "400px")
dsid file_format uz_station location station stnd_title start_dttm end_dttm logger_type logger_title logger_sn logger_os logger_program_name logger_program_sig uz_record_interval_type uz_record_interval record_interval_type record_interval dttm_inc_exc dttm_ie_chng uz_table_name table_name nomvet_name file_origin logg_interfere
1 TOA5 Science_Centre_AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws 2021-06-16 00:00:00 2021-08-10 00:00:00 CR1000 NA 88296 CR1000.Std.31.08 CPU:Vasi_Science Centre AWS.CR1 64103 continuous 1_days continuous 1_days FALSE TRUE Daily raw_1_days mcp_vasi_science_centre_aws_1_days_20210616-20210810__1.ipi NA on_site
4 TOA5 Science_Centre_AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws 2021-08-11 00:00:00 2021-09-12 00:00:00 CR1000 NA 88296 CR1000.Std.31.08 CPU:Vasi_Science Centre AWS.CR1 64103 continuous 1_days continuous 1_days FALSE TRUE Daily raw_1_days mcp_vasi_science_centre_aws_1_days_20210811-20210912__1.ipi NA on_site
7 TOA5 Science_Centre_AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws 2021-09-13 00:00:00 2021-10-12 00:00:00 CR1000 NA 88296 CR1000.Std.31.08 CPU:Vasi_Science Centre AWS.CR1 64103 continuous 1_days continuous 1_days FALSE TRUE Daily raw_1_days mcp_vasi_science_centre_aws_1_days_20210913-20211012__1.ipi NA on_site
10 TOA5 Science_Centre_AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws 2021-10-13 00:00:00 2021-11-07 00:00:00 CR1000 NA 88296 CR1000.Std.31.08 CPU:Vasi_Science Centre AWS.CR1 64103 continuous 1_days continuous 1_days FALSE TRUE Daily raw_1_days mcp_vasi_science_centre_aws_1_days_20211013-20211107__1.ipi NA on_site
13 TOA5 Science_Centre_AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws 2021-11-08 00:00:00 2021-12-06 00:00:00 CR1000 NA 88296 CR1000.Std.31.08 CPU:Vasi_Science Centre AWS.CR1 64103 continuous 1_days continuous 1_days FALSE TRUE Daily raw_1_days mcp_vasi_science_centre_aws_1_days_20211108-20211206__1.ipi NA on_site
16 TOA5 Science_Centre_AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws 2021-12-07 00:00:00 2022-01-11 00:00:00 CR1000 NA 88296 CR1000.Std.31.08 CPU:Vasi_Science Centre AWS.CR1 64103 continuous 1_days continuous 1_days FALSE TRUE Daily raw_1_days mcp_vasi_science_centre_aws_1_days_20211207-20220111__1.ipi NA on_site
19 TOA5 Science_Centre_AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws 2022-02-09 00:00:00 2022-03-06 00:00:00 CR1000 NA 88296 CR1000.Std.31.08 CPU:Vasi_Science Centre AWS.CR1 64103 continuous 1_days continuous 1_days FALSE TRUE Daily raw_1_days mcp_vasi_science_centre_aws_1_days_20220209-20220306__1.ipi NA on_site
2 TOA5 Science_Centre_AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws 2021-06-16 11:00:00 2021-08-11 14:00:00 CR1000 NA 88296 CR1000.Std.31.08 CPU:Vasi_Science Centre AWS.CR1 64103 continuous 1_hours continuous 1_hours FALSE TRUE Hourly raw_1_hours mcp_vasi_science_centre_aws_1_hours_20210616-20210811__1.ipi NA on_site
5 TOA5 Science_Centre_AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws 2021-08-11 15:00:00 2021-09-13 14:00:00 CR1000 NA 88296 CR1000.Std.31.08 CPU:Vasi_Science Centre AWS.CR1 64103 continuous 1_hours continuous 1_hours FALSE TRUE Hourly raw_1_hours mcp_vasi_science_centre_aws_1_hours_20210811-20210913__1.ipi NA on_site
8 TOA5 Science_Centre_AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws 2021-09-13 15:00:00 2021-10-13 14:00:00 CR1000 NA 88296 CR1000.Std.31.08 CPU:Vasi_Science Centre AWS.CR1 64103 continuous 1_hours continuous 1_hours FALSE TRUE Hourly raw_1_hours mcp_vasi_science_centre_aws_1_hours_20210913-20211013__1.ipi NA on_site
11 TOA5 Science_Centre_AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws 2021-10-13 15:00:00 2021-11-08 11:00:00 CR1000 NA 88296 CR1000.Std.31.08 CPU:Vasi_Science Centre AWS.CR1 64103 continuous 1_hours continuous 1_hours FALSE TRUE Hourly raw_1_hours mcp_vasi_science_centre_aws_1_hours_20211013-20211108__1.ipi NA on_site
14 TOA5 Science_Centre_AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws 2021-11-08 12:00:00 2021-12-07 13:00:00 CR1000 NA 88296 CR1000.Std.31.08 CPU:Vasi_Science Centre AWS.CR1 64103 continuous 1_hours continuous 1_hours FALSE TRUE Hourly raw_1_hours mcp_vasi_science_centre_aws_1_hours_20211108-20211207__1.ipi NA on_site
17 TOA5 Science_Centre_AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws 2021-12-07 14:00:00 2022-01-12 10:00:00 CR1000 NA 88296 CR1000.Std.31.08 CPU:Vasi_Science Centre AWS.CR1 64103 continuous 1_hours continuous 1_hours FALSE TRUE Hourly raw_1_hours mcp_vasi_science_centre_aws_1_hours_20211207-20220112__1.ipi NA on_site
20 TOA5 Science_Centre_AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws 2022-02-09 16:00:00 2022-03-07 15:00:00 CR1000 NA 88296 CR1000.Std.31.08 CPU:Vasi_Science Centre AWS.CR1 64103 continuous 1_hours continuous 1_hours FALSE TRUE Hourly raw_1_hours mcp_vasi_science_centre_aws_1_hours_20220209-20220307__1.ipi NA on_site
3 TOA5 Science_Centre_AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws 2021-06-16 11:50:00 2021-08-11 15:15:00 CR1000 NA 88296 CR1000.Std.31.08 CPU:Vasi_Science Centre AWS.CR1 64103 continuous 5_mins continuous 5_mins FALSE TRUE Five_mins raw_5_mins mcp_vasi_science_centre_aws_5_mins_20210616-20210811__1.ipi NA on_site
6 TOA5 Science_Centre_AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws 2021-08-11 15:20:00 2021-09-13 15:40:00 CR1000 NA 88296 CR1000.Std.31.08 CPU:Vasi_Science Centre AWS.CR1 64103 continuous 5_mins continuous 5_mins FALSE TRUE Five_mins raw_5_mins mcp_vasi_science_centre_aws_5_mins_20210811-20210913__1.ipi NA on_site
9 TOA5 Science_Centre_AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws 2021-09-13 15:45:00 2021-10-13 15:35:00 CR1000 NA 88296 CR1000.Std.31.08 CPU:Vasi_Science Centre AWS.CR1 64103 continuous 5_mins continuous 5_mins FALSE TRUE Five_mins raw_5_mins mcp_vasi_science_centre_aws_5_mins_20210913-20211013__1.ipi NA on_site
12 TOA5 Science_Centre_AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws 2021-10-13 15:40:00 2021-11-08 12:50:00 CR1000 NA 88296 CR1000.Std.31.08 CPU:Vasi_Science Centre AWS.CR1 64103 continuous 5_mins continuous 5_mins FALSE TRUE Five_mins raw_5_mins mcp_vasi_science_centre_aws_5_mins_20211013-20211108__1.ipi NA on_site
15 TOA5 Science_Centre_AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws 2021-11-08 12:55:00 2021-12-07 14:35:00 CR1000 NA 88296 CR1000.Std.31.08 CPU:Vasi_Science Centre AWS.CR1 64103 continuous 5_mins continuous 5_mins FALSE TRUE Five_mins raw_5_mins mcp_vasi_science_centre_aws_5_mins_20211108-20211207__1.ipi NA on_site
18 TOA5 Science_Centre_AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws 2021-12-07 14:40:00 2022-01-12 11:50:00 CR1000 NA 88296 CR1000.Std.31.08 CPU:Vasi_Science Centre AWS.CR1 64103 continuous 5_mins continuous 5_mins FALSE TRUE Five_mins raw_5_mins mcp_vasi_science_centre_aws_5_mins_20211207-20220112__1.ipi NA on_site
21 TOA5 Science_Centre_AWS mcp vasi_science_centre_aws mcp_vasi_science_centre_aws 2022-02-09 16:05:00 2022-03-07 16:05:00 CR1000 NA 88296 CR1000.Std.31.08 CPU:Vasi_Science Centre AWS.CR1 64103 continuous 5_mins continuous 5_mins FALSE TRUE Five_mins raw_5_mins mcp_vasi_science_centre_aws_5_mins_20220209-20220307__1.ipi NA on_site

The phenomena table: ‘phens’

A station file version of phenomena standards. Note each phenomena variation/synonym has a unique identifier (‘phid’) within the scope of this station.

# using kableExtra
kbl(sf$phens) |> kable_paper("hover") |> kable_styling(font_size = 11) |>
  scroll_box(width = "100%", height = "400px")
phid phen_name_full phen_type phen_name units measure offset var_type uz_phen_name uz_units uz_measure f_convert sensor_id notes table_name
1 Humidity, relative: maximum Humidity, relative humid_rel_max_pcnt pcnt max 0 num RH_Max % Max NA no_spec NA raw_1_days
2 Humidity, relative: minimum Humidity, relative humid_rel_min pcnt min 0 num RH_Min % Min NA no_spec NA raw_1_days
3 Rainfall: total Rainfall rain_tot mm tot 0 num Rain_mm_Tot mm Tot NA no_spec NA raw_1_days
4 Solar radiation: maximum Solar radiation solar_rad_max w_per_m2 max 0 num SlrW_Max W/m^2 Max NA no_spec NA raw_1_days
5 Solar radiation: standard deviation Solar radiation solar_rad_sd w_per_m2 sd 0 num SlrW_Std W/m^2 Std NA no_spec NA raw_1_days
6 Temperature, air: maximum Temperature, air temp_air_max deg_c max 0 num AirTC_Max Deg C Max NA no_spec NA raw_1_days
7 Temperature, air: minimum Temperature, air temp_air_min deg_c min 0 num AirTC_Min Deg C Min NA no_spec NA raw_1_days
8 Temperature, ground level: average Temperature, ground level temp_ground_avg deg_c avg 0 num T107_C_Avg Deg C Avg NA no_spec NA raw_1_days
9 Temperature, ground level: minimum Temperature, ground level temp_ground_min deg_c min 0 num T107_C_Min Deg C Min NA no_spec NA raw_1_days
10 UV radiation: maximum UV radiation uv_rad_max w_per_m2 max 0 num CUV5_W_Max W/m^2 Max NA no_spec NA raw_1_days
11 UV radiation: standard deviation UV radiation uv_rad_sd w_per_m2 sd 0 num CUV5_W_Std W/m^2 Std NA no_spec NA raw_1_days
12 UV radiation: total UV radiation uv_rad_tot mj_per_m2 tot 0 num CUV5_MJ_Tot MJ/m^2 Tot NA no_spec NA raw_1_days
13 Wind direction: sample Wind direction wind_dir deg smp 0 num WindDir_D1_WVT Deg WVc NA no_spec NA raw_1_days
14 Wind direction: standard deviation Wind direction wind_dir_sd deg sd 0 num WindDir_SD1_WVT Deg WVc NA no_spec NA raw_1_days
15 Wind speed: average Wind speed wind_speed_avg m_per_sec avg 0 num VW_Avg no_spec Avg NA no_spec NA raw_1_days
16 Wind speed: maximum Wind speed wind_speed_max m_per_sec max 0 num WS_ms_Max meters/second Max NA no_spec NA raw_1_days
17 Wind speed: sample Wind speed wind_speed m_per_sec smp 0 num WS_ms_S_WVT meters/second WVc NA no_spec NA raw_1_days
18 Battery level: minimum voltage Battery level batt_min_v v min 0 num BattV_Min Volts Min NA no_spec NA raw_1_hours
19 Humidity, relative: percent Humidity, relative humid_rel_pcnt pcnt smp 0 num RH % Smp NA no_spec NA raw_1_hours
20 Pressure, atmosphere: sample Pressure, atmosphere pressure_atm kpa smp 0 num BP_kPa kPa Smp NA no_spec NA raw_1_hours
3 Rainfall: total Rainfall rain_tot mm tot 0 num Rain_mm_Tot mm Tot NA no_spec NA raw_1_hours
22 Solar radiation: average Solar radiation solar_rad_avg w_per_m2 avg 0 num SlrW_Avg W/m^2 Avg NA no_spec NA raw_1_hours
23 Temperature, air: average Temperature, air temp_air_avg deg_c avg 0 num AirTC_Avg Deg C Avg NA no_spec NA raw_1_hours
8 Temperature, ground level: average Temperature, ground level temp_ground_avg deg_c avg 0 num T107_C_Avg Deg C Avg NA no_spec NA raw_1_hours
9 Temperature, ground level: minimum Temperature, ground level temp_ground_min deg_c min 0 num T107_C_Min Deg C Min NA no_spec NA raw_1_hours
26 UV radiation: average UV radiation uv_rad_avg w_per_m2 avg 0 num CUV5_W_Avg W/m^2 Avg NA no_spec NA raw_1_hours
13 Wind direction: sample Wind direction wind_dir deg smp 0 num WindDir_D1_WVT Deg WVc NA no_spec NA raw_1_hours
14 Wind direction: standard deviation Wind direction wind_dir_sd deg sd 0 num WindDir_SD1_WVT Deg WVc NA no_spec NA raw_1_hours
15 Wind speed: average Wind speed wind_speed_avg m_per_sec avg 0 num VW_Avg no_spec Avg NA no_spec NA raw_1_hours
17 Wind speed: sample Wind speed wind_speed m_per_sec smp 0 num WS_ms_S_WVT meters/second WVc NA no_spec NA raw_1_hours
19 Humidity, relative: percent Humidity, relative humid_rel_pcnt pcnt smp 0 num RH % Smp NA no_spec NA raw_5_mins
3 Rainfall: total Rainfall rain_tot mm tot 0 num Rain_mm_Tot mm Tot NA no_spec NA raw_5_mins
22 Solar radiation: average Solar radiation solar_rad_avg w_per_m2 avg 0 num SlrW_Avg W/m^2 Avg NA no_spec NA raw_5_mins
23 Temperature, air: average Temperature, air temp_air_avg deg_c avg 0 num AirTC_Avg Deg C Avg NA no_spec NA raw_5_mins
8 Temperature, ground level: average Temperature, ground level temp_ground_avg deg_c avg 0 num T107_C_Avg Deg C Avg NA no_spec NA raw_5_mins
26 UV radiation: average UV radiation uv_rad_avg w_per_m2 avg 0 num CUV5_W_Avg W/m^2 Avg NA no_spec NA raw_5_mins
13 Wind direction: sample Wind direction wind_dir deg smp 0 num WindDir_D1_WVT Deg WVc NA no_spec NA raw_5_mins
14 Wind direction: standard deviation Wind direction wind_dir_sd deg sd 0 num WindDir_SD1_WVT Deg WVc NA no_spec NA raw_5_mins
17 Wind speed: sample Wind speed wind_speed m_per_sec smp 0 num WS_ms_S_WVT meters/second WVc NA no_spec NA raw_5_mins

Note the ‘phid’ link in the temporal phenomena summary below.

# using kableExtra
kbl(sf$phen_data_summary) |> kable_paper("hover") |> kable_styling(font_size = 11) |>
  scroll_box(width = "100%", height = "400px")
phid phen_name start_dttm end_dttm table_name
1 humid_rel_max_pcnt 2021-06-16 00:00:00 2022-03-06 00:00:00 raw_1_days
2 humid_rel_min 2021-06-16 00:00:00 2022-03-06 00:00:00 raw_1_days
3 rain_tot 2021-06-16 00:00:00 2022-03-06 00:00:00 raw_1_days
4 solar_rad_max 2021-06-16 00:00:00 2022-03-06 00:00:00 raw_1_days
5 solar_rad_sd 2021-06-16 00:00:00 2022-03-06 00:00:00 raw_1_days
6 temp_air_max 2021-06-16 00:00:00 2022-03-06 00:00:00 raw_1_days
7 temp_air_min 2021-06-16 00:00:00 2022-03-06 00:00:00 raw_1_days
8 temp_ground_avg 2021-06-16 00:00:00 2022-03-06 00:00:00 raw_1_days
9 temp_ground_min 2021-06-16 00:00:00 2022-03-06 00:00:00 raw_1_days
10 uv_rad_max 2021-06-16 00:00:00 2022-03-06 00:00:00 raw_1_days
11 uv_rad_sd 2021-06-16 00:00:00 2022-03-06 00:00:00 raw_1_days
12 uv_rad_tot 2021-06-16 00:00:00 2022-03-06 00:00:00 raw_1_days
13 wind_dir 2021-06-16 00:00:00 2022-03-06 00:00:00 raw_1_days
14 wind_dir_sd 2021-06-16 00:00:00 2022-03-06 00:00:00 raw_1_days
17 wind_speed 2021-06-16 00:00:00 2022-03-06 00:00:00 raw_1_days
15 wind_speed_avg 2021-06-16 00:00:00 2022-03-06 00:00:00 raw_1_days
16 wind_speed_max 2021-06-16 00:00:00 2022-03-06 00:00:00 raw_1_days
18 batt_min_v 2021-06-16 11:00:00 2022-03-07 15:00:00 raw_1_hours
19 humid_rel_pcnt 2021-06-16 11:00:00 2022-03-07 15:00:00 raw_1_hours
20 pressure_atm 2021-06-16 11:00:00 2022-03-07 15:00:00 raw_1_hours
3 rain_tot 2021-06-16 11:00:00 2022-03-07 15:00:00 raw_1_hours
22 solar_rad_avg 2021-06-16 11:00:00 2022-03-07 15:00:00 raw_1_hours
23 temp_air_avg 2021-06-16 11:00:00 2022-03-07 15:00:00 raw_1_hours
8 temp_ground_avg 2021-06-16 11:00:00 2022-03-07 15:00:00 raw_1_hours
9 temp_ground_min 2021-06-16 11:00:00 2022-03-07 15:00:00 raw_1_hours
26 uv_rad_avg 2021-06-16 11:00:00 2022-03-07 15:00:00 raw_1_hours
13 wind_dir 2021-06-16 11:00:00 2022-03-07 15:00:00 raw_1_hours
14 wind_dir_sd 2021-06-16 11:00:00 2022-03-07 15:00:00 raw_1_hours
17 wind_speed 2021-06-16 11:00:00 2022-03-07 15:00:00 raw_1_hours
15 wind_speed_avg 2021-06-16 11:00:00 2022-03-07 15:00:00 raw_1_hours
19 humid_rel_pcnt 2021-06-16 11:50:00 2022-03-07 16:05:00 raw_5_mins
3 rain_tot 2021-06-16 11:50:00 2022-03-07 16:05:00 raw_5_mins
22 solar_rad_avg 2021-06-16 11:50:00 2022-03-07 16:05:00 raw_5_mins
23 temp_air_avg 2021-06-16 11:50:00 2022-03-07 16:05:00 raw_5_mins
8 temp_ground_avg 2021-06-16 11:50:00 2022-03-07 16:05:00 raw_5_mins
26 uv_rad_avg 2021-06-16 11:50:00 2022-03-07 16:05:00 raw_5_mins
13 wind_dir 2021-06-16 11:50:00 2022-03-07 16:05:00 raw_5_mins
14 wind_dir_sd 2021-06-16 11:50:00 2022-03-07 16:05:00 raw_5_mins
17 wind_speed 2021-06-16 11:50:00 2022-03-07 16:05:00 raw_5_mins

Phenomena append notes: When appending phenomena tables, if there is overlapping data, ‘ipayipi’ will examine each overlapping phenomena series in turn and overwrite either new data (e.g., additions to a station file) or the station file data. Missing data (NAs) will however not be overwritten. The phenomena data summary keeps a temporal record of how phenomena have been appended. Maintaining these records helps data processing down the pipeline.

Missing data—gaps

Checking for data ‘gaps’ in continuous data streams can be fairly straight forward—just highlight the missing/NA values. But with discontinuous or event based data things are more nuanced. gap_eval_batch() identifies gap periods, specifically where a logger was not recording, in continuous and discontinuous time-series data. Here the data is continuous so identifying gaps is simple.

gap_eval_batch(pipe_house, cores = 3)
## [1] "/tmp/Rtmp4GQshS/sf/mcp_vasi_science_centre_aws.ipip/data_summary"
## [1] "/tmp/Rtmp4GQshS/sf/mcp_vasi_science_centre_aws.ipip/data_summary"
## [1] "/tmp/Rtmp4GQshS/sf/mcp_vasi_science_centre_aws.ipip/gaps"
## [1] "/tmp/Rtmp4GQshS/sf/mcp_vasi_science_centre_aws.ipip/phen_data_summary"
## [1] "/tmp/Rtmp4GQshS/sf/mcp_vasi_science_centre_aws.ipip/phens"
# read in the station file
sf <- readRDS(file.path("ipayipi/data-raw/eg_data/met_eg",
  "d4_ipip_room/mcp_vasi_science_centre_aws.ipip"))
sf$gaps
##      gid    eid gap_type   phen  table_name           gap_start             gap_end    dt_diff_s gap_problem_thresh_s problem_gap  notes
##    <int> <lgcl>   <char> <char>      <char>              <POSc>              <POSc>   <difftime>                <num>      <lgcl> <char>
## 1:     1     NA     auto logger  raw_1_days 2022-01-12 00:00:00 2022-02-08 00:00:00 2505600 secs                86400        TRUE   <NA>
## 2:     1     NA     auto logger raw_1_hours 2022-01-12 11:00:00 2022-02-09 15:00:00 2440800 secs                 3600        TRUE   <NA>
## 3:     1     NA     auto logger  raw_5_mins 2022-01-12 11:55:00 2022-02-09 16:00:00 2434500 secs                 1500        TRUE   <NA>

Note the table and graph correctly show that February 2022 is missing in all raw data tables—we omitted importing this data using the ‘unwanted’ parameter in the import data stage above. Hover over the graph—interact.

p <- dta_availability(pipe_house)
## [1] "/tmp/Rtmp4GQshS/sf/mcp_vasi_science_centre_aws.ipip/data_summary"
## [1] "/tmp/Rtmp4GQshS/sf/mcp_vasi_science_centre_aws.ipip/gaps"
p <- p$plt + scale_colour_sunset(discrete = TRUE) +
      labs(color = "Station") +
      theme(legend.position = "none")
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
plotly::ggplotly(p)

Gaps highlighted in the dark-red colour.